Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译
大型实验设施中的实时数据收集和分析在多个域中具有巨大的挑战,包括高能物理,核物理和宇宙学。为了解决此问题,机器学习(ML)为实时数据压缩的基础方法绘制了显着的关注。然而,与自然图像数据不同,例如具有相对较小的尺寸和连续的CiFar和Imagenet,科学数据通常以高稀疏度(许多零)和非高斯值分布的高速率作为三维数据量。这使得流行的ML压缩方法以及传统的数据压缩方法,次优。为了解决这些障碍,这项工作引入了双头AutoEncoder,以同时解析稀疏性和回归,称为\ Textit {双折叠卷积AutoEncoder}(BCAE)。该方法显示了与传统数据压缩方法相比的压缩保真度和比率,例如MGARD,SZ和ZFP。为了实现类似的保真度,传统方法中的最佳表现者只能达到BCAE的压缩比的一半。此外,BCAE方法的彻底消融研究表明,专用分段解码器改善了重建。
translated by 谷歌翻译
磁共振光谱(MRS)是揭示代谢信息的无创工具。 1H-MRS的一个挑战是低信号噪声比(SNR)。为了改善SNR,一种典型的方法是用M重复样品进行信号平均(SA)。但是,数据采集时间相应地增加了M次,并且在公共环境M = 128时,完整的临床MRS SCAN大约需要10分钟。最近,引入了深度学习以改善SNR,但大多数人将模拟数据用作培训集。这可能会阻碍MRS应用程序,因为某些潜在差异(例如获取系统的缺陷)以及模拟和体内数据之间可能存在生理和心理条件。在这里,我们提出了一种新方案,该方案纯粹使用了现实数据的重复样本。深度学习模型,拒绝长期记忆(RELSTM),旨在学习从低SNR时间域数据(24 SA)到高SNR ONE(128 SA)的映射。对7个健康受试者,2名脑肿瘤患者和1名脑梗塞患者的体内脑光谱进行实验表明,仅使用20%的重复样品,RelstM的DeNoed Spectra可以为128 SA提供可比的代谢物。与最先进的低级别去核法相比,RELSTM在量化某些重要的生物标志物时达到了较低的相对误差和cram \'er-rao下限。总而言之,RELSTM可以在快速获取(24 SA)下对光谱进行高保真降级,这对MRS临床研究很有价值。
translated by 谷歌翻译
近年来,随着深度神经网络的发展,端到端优化的图像压缩已取得了重大进展,并超过了速度延伸性能的经典方法。但是,大多数基于学习的图像压缩方法是未标记的,在优化模型时不考虑图像语义或内容。实际上,人眼对不同内容具有不同的敏感性,因此还需要考虑图像内容。在本文中,我们提出了一种面向内容的图像压缩方法,该方法处理具有不同策略的不同类型的图像内容。广泛的实验表明,与最先进的端到端学习的图像压缩方法或经典方法相比,所提出的方法可实现竞争性的主观结果。
translated by 谷歌翻译
从样本中学习概率分布的任务在整个自然科学中无处不在。局部量子电路的输出分布构成了一类特别有趣的分布类别,对量子优势提案和各种量子机学习算法都具有关键的重要性。在这项工作中,我们提供了局部量子电路输出分布的可学习性的广泛表征。我们的第一个结果可以深入了解这些分布的有效学习性与有效的可模拟性之间的关系。具体而言,我们证明与Clifford电路相关的密度建模问题可以有效地解决,而对于深度$ d = n^{\ omega(1)} $电路,将单个$ t $ gate注入到电路中,这使这是如此问题很难。该结果表明,有效的模拟性并不意味着有效的可学习性。我们的第二组结果提供了对量子生成建模算法的潜在和局限性的见解。我们首先证明与深度$ d = n^{\ omega(1)} $局部量子电路相关的生成建模问题对于任何学习算法,经典或量子都很难。结果,一个人不能使用量子算法来为此任务获得实际优势。然后,我们证明,对于各种最实际相关的学习算法(包括混合量词古典算法),即使是与深度$ d = \ omega(\ log(n))$ Clifford Circuits相关的生成建模问题也是如此难的。该结果对近期混合量子古典生成建模算法的适用性造成了限制。
translated by 谷歌翻译
在本文中,我们研究了从同步2D和3D数据共同估计光流量和场景流的问题。以前的方法使用复杂的管道,将联合任务分成独立阶段,或以“早期融合”或“迟到的”方式“的熔断器2D和3D信息。这种单尺寸适合的方法遭受了未能充分利用每个模态的特征的困境,或者最大化模态互补性。为了解决这个问题,我们提出了一个新的端到端框架,称为Camliflow。它由2D和3D分支组成,在特定层之间具有多个双向连接。与以前的工作不同,我们应用基于点的3D分支以更好地提取几何特征,并设计一个对称的学习操作员以保险熔断致密图像特征和稀疏点特征。我们还提出了一种转换,以解决3D-2D投影的非线性问题。实验表明,Camliflow以更少的参数实现了更好的性能。我们的方法在Kitti场景流基准上排名第一,表现出以1/7参数的前一篇文章。代码将可用。
translated by 谷歌翻译
随着深度学习技术的发展,深度学习与图像压缩的结合引起了很多关注。最近,学到的图像压缩方法在速率绩效方面超出了其经典对应物。但是,连续的速率适应仍然是一个悬而未决的问题。一些学到的图像压缩方法将多个网络用于多个速率,而另一些则使用一个模型,而牺牲了计算复杂性的增加和性能降解。在本文中,我们提出了一个不断的可调节率的学术图像压缩框架,不对称获得了变异自动编码器(AG-VAE)。 AG-VAE利用一对增益单元在一个单个模型中实现离散率适应,并具有可忽略的附加计算。然后,通过使用指数插值,可以在不损害性能的情况下实现连续速率适应。此外,我们提出了不对称的高斯熵模型,以进行更准确的熵估计。详尽的实验表明,与经典图像编解码器相比,我们的方法通过SOTA学习的图像压缩方法获得了可比的定量性能,并且定性性能更好。在消融研究中,我们证实了增益单元和不对称高斯熵模型的有用性和优势。
translated by 谷歌翻译
Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted heuristics and rule-based policies that require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverage reinforcement learning to provide the model compression policy. This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor. Under 4× FLOPs reduction, we achieved 2.7% better accuracy than the handcrafted model compression policy for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet and achieved 1.81× speedup of measured inference latency on an Android phone and 1.43× speedup on the Titan XP GPU, with only 0.1% loss of ImageNet Top-1 accuracy.
translated by 谷歌翻译
In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks. Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5× speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2× speedup respectively, which is significant. Code has been made publicly available 1 .
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译